erplay within peptides.

logical question — protease cleavage pattern discovery

data used in either the protease cleavage pattern discovery or the

lational modification pattern discovery are peptides. A peptide is

equence cut from a long sequence, which normally is between a

dozen of residues long. There are two key problems of peptide

iscovery. They are the poly-protein cleavage activity caused by

proteases and the post-translational modification caused by

hemicals. Many diverse functions within proteins are due to a

ribution of these proteases and chemicals in nature. Both subjects

n extensively researched in the protein science. For instance, the

over many areas such as protein chemistry, proteomics, and

maceutical manufacture, peptide mass fingerprinting, protein

ation, protein domain separation and protease activity recognition

s, 1993].

ptide used for the protease cleavage pattern discovery is

y expressed as ܴ⋯ܴܴܴ

ܴ

⋯ܴ௡ᇱ, where ܴଵஸ௜ஸ௠ stands for

e N-terminal residues and ܴଵஸ௝ஸ௡

stands for one of the C-terminal

The cleavage happens between ܴ and ܴ

. A peptide used for the

slational modification pattern discovery is commonly expressed

ܴܴܴܴ

ܴ

⋯ܴ௡ᇱ, where ܴ is the modification site. Most

learning models constructed for peptide pattern discovery treat

as mutually independent variables. For instance, the binary

approach encodes each residue using a binary vector and the bio-

ction introduced in the Chapter 3 of this book employs a mutation

align two peptides so as to encode peptides. The alignment

two peptides is a linear sum of the mutation probabilities between

ides pair-wisely. Therefore, potential residue interplay has not

sidered into a modelling process of peptide pattern analysis using

oding approaches.

te of the assumption of mutual independency between residues in

chine learning models, the research into residue interplay has